In [1]:
import graphlab

Import data


In [2]:
graphlab.product_key.get_product_key()


Out[2]:
'0E3B-DE26-6D97-A768-D311-CF10-48C1-19AE'

In [2]:
song_data = graphlab.SFrame('song_data.gl/')


This non-commercial license of GraphLab Create for academic use is assigned to sudhanshu.shekhar.iitd@gmail.com and will expire on September 18, 2017.
[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1497431985.log

In [4]:
song_data.head()


Out[4]:
user_id song_id listen_count title artist
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOAKIMP12A8C130995 1 The Cove Jack Johnson
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBBMDR12A8C13253B 2 Entre Dos Aguas Paco De Lucia
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBXHDL12A81C204C0 1 Stronger Kanye West
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBYHAJ12A6701BF1D 1 Constellations Jack Johnson
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODACBL12A8C13C273 1 Learn To Fly Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODDNQT12A6D4F5F7E 5 Apuesta Por El Rock 'N'
Roll ...
Héroes del Silencio
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODXRTY12AB0180F3B 1 Paper Gangsta Lady GaGa
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFGUAY12AB017B0A8 1 Stacked Actors Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFRQTD12A81C233C0 1 Sehr kosmisch Harmonia
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOHQWYZ12A6D4FA701 1 Heaven's gonna burn your
eyes ...
Thievery Corporation
feat. Emiliana Torrini ...
song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De
Lucia ...
Stronger - Kanye West
Constellations - Jack
Johnson ...
Learn To Fly - Foo
Fighters ...
Apuesta Por El Rock 'N'
Roll - Héroes del ...
Paper Gangsta - Lady GaGa
Stacked Actors - Foo
Fighters ...
Sehr kosmisch - Harmonia
Heaven's gonna burn your
eyes - Thievery ...
[10 rows x 6 columns]


In [5]:
graphlab.canvas.set_target('ipynb')

In [6]:
song_data['song'].show()



In [10]:
len(song_data)


Out[10]:
1116609

Count number of users


In [11]:
users = song_data['user_id'].unique()
len(users)


Out[11]:
66346

Create a song recommender


In [12]:
train_data, test_data = song_data.random_split(0.8, seed=0)

Popularity model


In [15]:
popularity_model = graphlab.popularity_recommender.create(train_data, user_id='user_id', item_id='song')


Recsys training: model = popularity
Warning: Ignoring columns song_id, listen_count, title, artist;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 893580 observations with 66085 users and 9952 items.
    Data prepared in: 0.905357s
893580 observations to process; with 9952 unique items.

Use popularity model for predictions


In [16]:
popularity_model.recommend(users=[users[0]])


Out[16]:
user_id song score rank
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Sehr kosmisch - Harmonia 4754.0 1
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Undo - Björk 4227.0 2
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
You're The One - Dwight
Yoakam ...
3781.0 3
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Dog Days Are Over (Radio
Edit) - Florence + The ...
3633.0 4
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Revelry - Kings Of Leon 3527.0 5
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Horn Concerto No. 4 in E
flat K495: II. Romance ...
3161.0 6
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Secrets - OneRepublic 3148.0 7
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Hey_ Soul Sister - Train 2538.0 8
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Fireflies - Charttraxx
Karaoke ...
2532.0 9
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Tive Sim - Cartola 2521.0 10
[10 rows x 4 columns]


In [17]:
popularity_model.recommend(users=[users[1]])


Out[17]:
user_id song score rank
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Sehr kosmisch - Harmonia 4754.0 1
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Undo - Björk 4227.0 2
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
You're The One - Dwight
Yoakam ...
3781.0 3
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Dog Days Are Over (Radio
Edit) - Florence + The ...
3633.0 4
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Revelry - Kings Of Leon 3527.0 5
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Horn Concerto No. 4 in E
flat K495: II. Romance ...
3161.0 6
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Secrets - OneRepublic 3148.0 7
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Hey_ Soul Sister - Train 2538.0 8
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Fireflies - Charttraxx
Karaoke ...
2532.0 9
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Tive Sim - Cartola 2521.0 10
[10 rows x 4 columns]

Build a song recommender with personalization


In [18]:
personalized_model = graphlab.item_similarity_recommender.create(train_data, user_id='user_id', item_id='song')


Recsys training: model = item_similarity
Warning: Ignoring columns song_id, listen_count, title, artist;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 893580 observations with 66085 users and 9952 items.
    Data prepared in: 0.895043s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 4.976ms                        | 1.5        |
| 30.306ms                       | 100        |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 251.994ms                           | 0                | 0               |
| 695.193ms                           | 100              | 9952            |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 1.75653s

Applying personalized model


In [19]:
personalized_model.recommend(users=[users[0]])


Out[19]:
user_id song score rank
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Riot In Cell Block Number
Nine - Dr Feelgood ...
0.0374999940395 1
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Sei Lá Mangueira -
Elizeth Cardoso ...
0.0331632643938 2
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
The Stallion - Ween 0.0322580635548 3
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Rain - Subhumans 0.0314159244299 4
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
West One (Shine On Me) -
The Ruts ...
0.0306771993637 5
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Back Against The Wall -
Cage The Elephant ...
0.0301204770803 6
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Life Less Frightening -
Rise Against ...
0.0284431129694 7
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
A Beggar On A Beach Of
Gold - Mike And The ...
0.0230024904013 8
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Audience Of One - Rise
Against ...
0.0193938463926 9
279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
Blame It On The Boogie -
The Jacksons ...
0.0189873427153 10
[10 rows x 4 columns]


In [20]:
personalized_model.recommend(users=[users[1]])


Out[20]:
user_id song score rank
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Grind With Me (Explicit
Version) - Pretty Ricky ...
0.0459424376488 1
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
There Goes My Baby -
Usher ...
0.0331920742989 2
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Panty Droppa [Intro]
(Album Version) - Trey ...
0.0318566203117 3
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Nobody (Featuring Athena
Cage) (LP Version) - ...
0.0278467655182 4
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Youth Against Fascism -
Sonic Youth ...
0.0262914180756 5
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Nice & Slow - Usher 0.0239639401436 6
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Making Love (Into The
Night) - Usher ...
0.0238176941872 7
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Naked - Marques Houston 0.0228925704956 8
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
I.nner Indulgence -
DESTRUCTION ...
0.0220767498016 9
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Love Lost (Album Version)
- Trey Songz ...
0.0204497694969 10
[10 rows x 4 columns]


In [21]:
personalized_model.get_similar_items(['With Or Without You - U2'])


Out[21]:
song similar score rank
With Or Without You - U2 I Still Haven't Found
What I'm Looking For ...
0.042857170105 1
With Or Without You - U2 Hold Me_ Thrill Me_ Kiss
Me_ Kill Me - U2 ...
0.0337349176407 2
With Or Without You - U2 Window In The Skies - U2 0.0328358411789 3
With Or Without You - U2 Vertigo - U2 0.0300751924515 4
With Or Without You - U2 Sunday Bloody Sunday - U2 0.0271317958832 5
With Or Without You - U2 Bad - U2 0.0251798629761 6
With Or Without You - U2 A Day Without Me - U2 0.0237154364586 7
With Or Without You - U2 Another Time Another
Place - U2 ...
0.0203251838684 8
With Or Without You - U2 Walk On - U2 0.0202020406723 9
With Or Without You - U2 Get On Your Boots - U2 0.0196850299835 10
[10 rows x 4 columns]

Quantitative comparison between models


In [17]:
%matplotlib inline

In [22]:
model_performance = graphlab.recommender.util.compare_models(test_data,
                                                            [popularity_model, personalized_model],
                                                            user_sample = 0.05)


compare_models: using 2931 users to estimate model performance
PROGRESS: Evaluate model M0
recommendations finished on 1000/2931 queries. users per second: 22889.1
recommendations finished on 2000/2931 queries. users per second: 25915.5
Precision and recall summary statistics by cutoff
+--------+-----------------+------------------+
| cutoff |  mean_precision |   mean_recall    |
+--------+-----------------+------------------+
|   1    | 0.0330945069942 | 0.00854915583677 |
|   2    | 0.0296827021494 | 0.0149519762621  |
|   3    | 0.0255885363357 | 0.0202633022234  |
|   4    |  0.022944387581 | 0.0235740514272  |
|   5    | 0.0210167178437 | 0.0267819394866  |
|   6    | 0.0201296485841 | 0.0313728061809  |
|   7    | 0.0188136667154 | 0.0339576668261  |
|   8    | 0.0182105083589 | 0.0375452313144  |
|   9    | 0.0175518404792 | 0.0412916763096  |
|   10   | 0.0165131354487 | 0.0430186994286  |
+--------+-----------------+------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1
recommendations finished on 1000/2931 queries. users per second: 22976.4
recommendations finished on 2000/2931 queries. users per second: 25563.4
Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.188331627431 | 0.0552799035256 |
|   2    |  0.154895939952 | 0.0862256051069 |
|   3    |  0.138746730354 |  0.113285811287 |
|   4    |  0.125383828045 |  0.136213602818 |
|   5    |  0.114090754009 |  0.154693798791 |
|   6    |  0.103377686796 |  0.168420379507 |
|   7    | 0.0971876980065 |  0.184159797266 |
|   8    | 0.0903275332651 |  0.194421743327 |
|   9    | 0.0844232154365 |  0.204325799378 |
|   10   | 0.0797338792221 |  0.214664021091 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

Assignment

1. Compute the number of unique users for each of these artists: 'Kanye West', 'Foo Fighters', 'Taylor Swift' and 'Lady GaGa'.

Save these results to answer the quiz at the end.


In [19]:
song_data.head()


Out[19]:
user_id song_id listen_count title artist
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOAKIMP12A8C130995 1 The Cove Jack Johnson
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBBMDR12A8C13253B 2 Entre Dos Aguas Paco De Lucia
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBXHDL12A81C204C0 1 Stronger Kanye West
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBYHAJ12A6701BF1D 1 Constellations Jack Johnson
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODACBL12A8C13C273 1 Learn To Fly Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODDNQT12A6D4F5F7E 5 Apuesta Por El Rock 'N'
Roll ...
Héroes del Silencio
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODXRTY12AB0180F3B 1 Paper Gangsta Lady GaGa
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFGUAY12AB017B0A8 1 Stacked Actors Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFRQTD12A81C233C0 1 Sehr kosmisch Harmonia
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOHQWYZ12A6D4FA701 1 Heaven's gonna burn your
eyes ...
Thievery Corporation
feat. Emiliana Torrini ...
song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De
Lucia ...
Stronger - Kanye West
Constellations - Jack
Johnson ...
Learn To Fly - Foo
Fighters ...
Apuesta Por El Rock 'N'
Roll - Héroes del ...
Paper Gangsta - Lady GaGa
Stacked Actors - Foo
Fighters ...
Sehr kosmisch - Harmonia
Heaven's gonna burn your
eyes - Thievery ...
[10 rows x 6 columns]

Kanye West


In [20]:
len(song_data[song_data['artist'] == 'Kanye West']['user_id'].unique())


Out[20]:
2522

Foo Fighters


In [21]:
len(song_data[song_data['artist'] == 'Foo Fighters']['user_id'].unique())


Out[21]:
2055

Taylor Swift


In [22]:
len(song_data[song_data['artist'] == 'Taylor Swift']['user_id'].unique())


Out[22]:
3246

Lady GaGa


In [23]:
len(song_data[song_data['artist'] == 'Lady GaGa']['user_id'].unique())


Out[23]:
2928

In [24]:
artist_songs = song_data.groupby(key_columns='artist', 
                  operations={'total_count': graphlab.aggregate.SUM('listen_count')})

In [25]:
artist_songs


Out[25]:
artist total_count
The Dells 274
16Volt 579
The Stray Cats 411
Billy Preston / Syreeta 189
Emma Shapplin 252
Lil Jon & The East Side
Boyz / Ludacris / Usher ...
256
Spoon 1061
Sam & Dave 656
Blue Swede 266
Scooter 1202
[3375 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.


In [26]:
artist_songs_sorted = artist_songs.sort('total_count', ascending=False)

In [27]:
artist_songs_sorted[0]


Out[27]:
{'artist': 'Kings Of Leon', 'total_count': 43218}

In [28]:
artist_songs_sorted[-1]


Out[28]:
{'artist': 'William Tabbert', 'total_count': 14}

In [29]:
train_data, test_data = song_data.random_split(0.8, seed=0)

In [30]:
item_similarity_recommender = graphlab.item_similarity_recommender.create(train_data, user_id='user_id', item_id='song')


Recsys training: model = item_similarity
Warning: Ignoring columns song_id, listen_count, title, artist;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 893580 observations with 66085 users and 9952 items.
    Data prepared in: 0.850904s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 1.173ms                        | 1.5        |
| 26.965ms                       | 100        |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 520.589ms                           | 0                | 0               |
| 975.835ms                           | 100              | 9952            |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 1.03486s

In [31]:
subset_test_users = test_data['user_id'].unique()[0:10000]

In [32]:
subset_test_recommendations = personalized_model.recommend(subset_test_users,k=1)


recommendations finished on 1000/10000 queries. users per second: 10605.2
recommendations finished on 2000/10000 queries. users per second: 14612.3
recommendations finished on 3000/10000 queries. users per second: 16746.5
recommendations finished on 4000/10000 queries. users per second: 18569.5
recommendations finished on 5000/10000 queries. users per second: 19956.6
recommendations finished on 6000/10000 queries. users per second: 21396.9
recommendations finished on 7000/10000 queries. users per second: 22030.2
recommendations finished on 8000/10000 queries. users per second: 22398.8
recommendations finished on 9000/10000 queries. users per second: 23149.9
recommendations finished on 10000/10000 queries. users per second: 23469.2

In [33]:
subset_test_recommendations.head()


Out[33]:
user_id song score rank
c067c22072a17d33310d7223d
7b79f819e48cf42 ...
Grind With Me (Explicit
Version) - Pretty Ricky ...
0.0459424376488 1
696787172dd3f5169dc94deef
97e427cee86147d ...
Senza Una Donna (Without
A Woman) - Zucchero / ...
0.017026577677 1
532e98155cbfd1e1a474a28ed
96e59e50f7c5baf ...
Jive Talkin' (Album
Version) - Bee Gees ...
0.0118288653237 1
18325842a941bc58449ee71d6
59a08d1c1bd2383 ...
Goodnight And Goodbye -
Jonas Brothers ...
0.0159257985651 1
507433946f534f5d25ad1be30
2edb9a2376f503c ...
Find The Cost Of Freedom
- Crosby_ Stills_ Nash & ...
0.0165806589303 1
18fafad477f9d72ff86f7d0bd
838a6573de0f64a ...
Rabbit Heart (Raise It
Up) - Florence + The ...
0.0799399726093 1
fe85b96ba1983219b296f6b48
69dd29eb2b72ff9 ...
Secrets - OneRepublic 0.0788827141126 1
225ea420b4bede50919d1bfe2
4a599691522d176 ...
Clocks - Coldplay 0.0271030251796 1
95dc7e2b188b1148b2d25f4e6
b6e94afacc4efc3 ...
Bust a Move - Infected
Mushroom ...
0.0534738540649 1
4a3a1ae2748f12f7ab921a47d
6d79abf82e3e325 ...
Isis (Spam Remix) -
Alaska Y Dinarama ...
0.04180302118 1
[10 rows x 4 columns]


In [34]:
recommendations_song = subset_test_recommendations.groupby(key_columns='song',
                                                           operations={'num_recommendations' : graphlab.aggregate.COUNT()}
                                                          )

In [35]:
recommendations_song


Out[35]:
song num_recommendations
Arco Arena - Cake 1
Too Deep - Girl Talk 2
Guys Like Me - Eric
Church ...
2
Freedom - Akon 2
Nomenclature - Andrew
Bird ...
1
Wish You Were Here -
Incubus ...
1
Change - Blind Melon 1
Get:On - Moguai 1
Pitter-Pat - Erin
McCarley ...
3
Dog Days Are Over (Radio
Edit) - Florence + The ...
29
[3145 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.


In [36]:
recommendations_song_sorted = recommendations_song.sort('num_recommendations', ascending=False)

In [37]:
recommendations_song_sorted


Out[37]:
song num_recommendations
Undo - Björk 438
Secrets - OneRepublic 374
Revelry - Kings Of Leon 222
You're The One - Dwight
Yoakam ...
161
Fireflies - Charttraxx
Karaoke ...
110
Sehr kosmisch - Harmonia 103
Hey_ Soul Sister - Train 100
Horn Concerto No. 4 in E
flat K495: II. Romance ...
90
OMG - Usher featuring
will.i.am ...
60
Bigger - Justin Bieber 43
[3145 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.


In [ ]:


In [ ]: